Overview

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells449444
Missing cells (%)8.4%8.3%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Survived is highly overall correlated with SexSurvived is highly overall correlated with SexHigh Correlation
Sex is highly overall correlated with SurvivedSex is highly overall correlated with SurvivedHigh Correlation
Age has 99 (22.2%) missing values Age has 93 (20.9%) missing values Missing
Cabin has 350 (78.5%) missing values Cabin has 351 (78.7%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 306 (68.6%) zeros SibSp has 312 (70.0%) zeros Zeros
Parch has 346 (77.6%) zeros Parch has 347 (77.8%) zeros Zeros
Fare has 6 (1.3%) zeros Fare has 6 (1.3%) zeros Zeros
Alert not present in this datasetFare is highly overall correlated with PclassHigh Correlation
Alert not present in this datasetPclass is highly overall correlated with FareHigh Correlation

Reproduction

 Dataset ADataset B
Analysis started2023-06-21 12:53:43.6214772023-06-21 12:53:48.247868
Analysis finished2023-06-21 12:53:48.2460912023-06-21 12:53:52.634410
Duration4.62 seconds4.39 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean435.98655454.1861
 Dataset ADataset B
Minimum12
Maximum891890
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-06-21T12:53:52.796440image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum12
5-th percentile46.547.25
Q1204.5240.25
median420.5456.5
Q3660669.75
95-th percentile855854.75
Maximum891890
Range890888
Interquartile range (IQR)455.5429.5

Descriptive statistics

 Dataset ADataset B
Standard deviation260.29785258.15885
Coefficient of variation (CV)0.597031840.56839883
Kurtosis-1.1940922-1.1837863
Mean435.98655454.1861
Median Absolute Deviation (MAD)225.5215.5
Skewness0.087904493-0.038801394
Sum194450202567
Variance67754.97166645.99
MonotonicityNot monotonicNot monotonic
2023-06-21T12:53:53.031639image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
819 1
 
0.2%
546 1
 
0.2%
864 1
 
0.2%
543 1
 
0.2%
318 1
 
0.2%
420 1
 
0.2%
49 1
 
0.2%
512 1
 
0.2%
505 1
 
0.2%
499 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
400 1
 
0.2%
97 1
 
0.2%
191 1
 
0.2%
121 1
 
0.2%
465 1
 
0.2%
71 1
 
0.2%
746 1
 
0.2%
329 1
 
0.2%
847 1
 
0.2%
759 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
1 1
0.2%
6 1
0.2%
7 1
0.2%
9 1
0.2%
10 1
0.2%
12 1
0.2%
14 1
0.2%
15 1
0.2%
17 1
0.2%
21 1
0.2%
ValueCountFrequency (%)
2 1
0.2%
3 1
0.2%
4 1
0.2%
5 1
0.2%
7 1
0.2%
8 1
0.2%
10 1
0.2%
14 1
0.2%
15 1
0.2%
16 1
0.2%
ValueCountFrequency (%)
2 1
0.2%
3 1
0.2%
4 1
0.2%
5 1
0.2%
7 1
0.2%
8 1
0.2%
10 1
0.2%
14 1
0.2%
15 1
0.2%
16 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
6 1
0.2%
7 1
0.2%
9 1
0.2%
10 1
0.2%
12 1
0.2%
14 1
0.2%
15 1
0.2%
17 1
0.2%
21 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
271 
1
175 
0
276 
1
170 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row01
2nd row10
3rd row01
4th row10
5th row00

Common Values

ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%
ValueCountFrequency (%)
0 276
61.9%
1 170
38.1%

Length

2023-06-21T12:53:53.210269image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-06-21T12:53:53.361571image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset B

2023-06-21T12:53:53.502769image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%
ValueCountFrequency (%)
0 276
61.9%
1 170
38.1%

Most occurring characters

ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%
ValueCountFrequency (%)
0 276
61.9%
1 170
38.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 446
100.0%
ValueCountFrequency (%)
Decimal Number 446
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%
ValueCountFrequency (%)
0 276
61.9%
1 170
38.1%

Most occurring scripts

ValueCountFrequency (%)
Common 446
100.0%
ValueCountFrequency (%)
Common 446
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%
ValueCountFrequency (%)
0 276
61.9%
1 170
38.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%
ValueCountFrequency (%)
0 276
61.9%
1 170
38.1%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
249 
1
108 
2
89 
3
257 
1
103 
2
86 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row32
2nd row23
3rd row33
4th row32
5th row33

Common Values

ValueCountFrequency (%)
3 249
55.8%
1 108
24.2%
2 89
 
20.0%
ValueCountFrequency (%)
3 257
57.6%
1 103
23.1%
2 86
 
19.3%

Length

2023-06-21T12:53:53.626541image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-06-21T12:53:53.781882image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset B

2023-06-21T12:53:53.932118image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
3 249
55.8%
1 108
24.2%
2 89
 
20.0%
ValueCountFrequency (%)
3 257
57.6%
1 103
23.1%
2 86
 
19.3%

Most occurring characters

ValueCountFrequency (%)
3 249
55.8%
1 108
24.2%
2 89
 
20.0%
ValueCountFrequency (%)
3 257
57.6%
1 103
23.1%
2 86
 
19.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 446
100.0%
ValueCountFrequency (%)
Decimal Number 446
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 249
55.8%
1 108
24.2%
2 89
 
20.0%
ValueCountFrequency (%)
3 257
57.6%
1 103
23.1%
2 86
 
19.3%

Most occurring scripts

ValueCountFrequency (%)
Common 446
100.0%
ValueCountFrequency (%)
Common 446
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 249
55.8%
1 108
24.2%
2 89
 
20.0%
ValueCountFrequency (%)
3 257
57.6%
1 103
23.1%
2 86
 
19.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 249
55.8%
1 108
24.2%
2 89
 
20.0%
ValueCountFrequency (%)
3 257
57.6%
1 103
23.1%
2 86
 
19.3%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-06-21T12:53:54.496496image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

 Dataset ADataset B
Max length8265
Median length4947
Mean length26.86098726.697309
Min length1412

Characters and Unicode

 Dataset ADataset B
Total characters1198011907
Distinct characters6060
Distinct categories77 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowHolm, Mr. John Fredrik AlexanderTrout, Mrs. William H (Jessie L)
2nd rowDavies, Master. John Morgan JrKarlsson, Mr. Nils August
3rd rowSage, Master. Thomas Henryde Mulder, Mr. Theodore
4th rowMadsen, Mr. Fridtjof ArneMcKane, Mr. Peter David
5th rowDahlberg, Miss. Gerda UlrikaAlhomaki, Mr. Ilmari Rudolf
ValueCountFrequency (%)
mr 263
 
14.6%
miss 86
 
4.8%
mrs 63
 
3.5%
william 35
 
1.9%
john 24
 
1.3%
master 24
 
1.3%
henry 21
 
1.2%
james 15
 
0.8%
charles 12
 
0.7%
joseph 11
 
0.6%
Other values (886) 1248
69.3%
ValueCountFrequency (%)
mr 263
 
14.7%
miss 89
 
5.0%
mrs 63
 
3.5%
william 31
 
1.7%
john 26
 
1.5%
master 18
 
1.0%
george 17
 
0.9%
henry 16
 
0.9%
charles 14
 
0.8%
thomas 13
 
0.7%
Other values (894) 1241
69.3%
2023-06-21T12:53:55.378742image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1356
 
11.3%
r 998
 
8.3%
e 883
 
7.4%
a 821
 
6.9%
i 642
 
5.4%
s 640
 
5.3%
n 635
 
5.3%
M 571
 
4.8%
l 522
 
4.4%
o 507
 
4.2%
Other values (50) 4405
36.8%
ValueCountFrequency (%)
1347
 
11.3%
r 956
 
8.0%
e 854
 
7.2%
a 816
 
6.9%
n 657
 
5.5%
i 654
 
5.5%
s 644
 
5.4%
M 563
 
4.7%
l 541
 
4.5%
o 510
 
4.3%
Other values (50) 4365
36.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 7706
64.3%
Uppercase Letter 1816
 
15.2%
Space Separator 1356
 
11.3%
Other Punctuation 951
 
7.9%
Close Punctuation 71
 
0.6%
Open Punctuation 71
 
0.6%
Dash Punctuation 9
 
0.1%
ValueCountFrequency (%)
Lowercase Letter 7665
64.4%
Uppercase Letter 1807
 
15.2%
Space Separator 1347
 
11.3%
Other Punctuation 944
 
7.9%
Close Punctuation 68
 
0.6%
Open Punctuation 68
 
0.6%
Dash Punctuation 8
 
0.1%

Most frequent character per category

Space Separator
ValueCountFrequency (%)
1356
100.0%
ValueCountFrequency (%)
1347
100.0%
Lowercase Letter
ValueCountFrequency (%)
r 998
13.0%
e 883
11.5%
a 821
10.7%
i 642
8.3%
s 640
8.3%
n 635
8.2%
l 522
 
6.8%
o 507
 
6.6%
t 346
 
4.5%
h 272
 
3.5%
Other values (16) 1440
18.7%
ValueCountFrequency (%)
r 956
12.5%
e 854
11.1%
a 816
10.6%
n 657
8.6%
i 654
8.5%
s 644
8.4%
l 541
 
7.1%
o 510
 
6.7%
t 334
 
4.4%
h 254
 
3.3%
Other values (16) 1445
18.9%
Uppercase Letter
ValueCountFrequency (%)
M 571
31.4%
J 119
 
6.6%
A 115
 
6.3%
H 107
 
5.9%
C 96
 
5.3%
S 93
 
5.1%
E 84
 
4.6%
W 76
 
4.2%
B 68
 
3.7%
L 55
 
3.0%
Other values (15) 432
23.8%
ValueCountFrequency (%)
M 563
31.2%
A 111
 
6.1%
J 107
 
5.9%
H 97
 
5.4%
C 87
 
4.8%
S 87
 
4.8%
E 83
 
4.6%
B 70
 
3.9%
L 70
 
3.9%
W 66
 
3.7%
Other values (15) 466
25.8%
Other Punctuation
ValueCountFrequency (%)
. 447
47.0%
, 446
46.9%
" 52
 
5.5%
' 5
 
0.5%
/ 1
 
0.1%
ValueCountFrequency (%)
. 446
47.2%
, 446
47.2%
" 48
 
5.1%
' 3
 
0.3%
/ 1
 
0.1%
Close Punctuation
ValueCountFrequency (%)
) 71
100.0%
ValueCountFrequency (%)
) 68
100.0%
Open Punctuation
ValueCountFrequency (%)
( 71
100.0%
ValueCountFrequency (%)
( 68
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 9
100.0%
ValueCountFrequency (%)
- 8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 9522
79.5%
Common 2458
 
20.5%
ValueCountFrequency (%)
Latin 9472
79.5%
Common 2435
 
20.5%

Most frequent character per script

Common
ValueCountFrequency (%)
1356
55.2%
. 447
 
18.2%
, 446
 
18.1%
) 71
 
2.9%
( 71
 
2.9%
" 52
 
2.1%
- 9
 
0.4%
' 5
 
0.2%
/ 1
 
< 0.1%
ValueCountFrequency (%)
1347
55.3%
. 446
 
18.3%
, 446
 
18.3%
) 68
 
2.8%
( 68
 
2.8%
" 48
 
2.0%
- 8
 
0.3%
' 3
 
0.1%
/ 1
 
< 0.1%
Latin
ValueCountFrequency (%)
r 998
 
10.5%
e 883
 
9.3%
a 821
 
8.6%
i 642
 
6.7%
s 640
 
6.7%
n 635
 
6.7%
M 571
 
6.0%
l 522
 
5.5%
o 507
 
5.3%
t 346
 
3.6%
Other values (41) 2957
31.1%
ValueCountFrequency (%)
r 956
 
10.1%
e 854
 
9.0%
a 816
 
8.6%
n 657
 
6.9%
i 654
 
6.9%
s 644
 
6.8%
M 563
 
5.9%
l 541
 
5.7%
o 510
 
5.4%
t 334
 
3.5%
Other values (41) 2943
31.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11980
100.0%
ValueCountFrequency (%)
ASCII 11907
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1356
 
11.3%
r 998
 
8.3%
e 883
 
7.4%
a 821
 
6.9%
i 642
 
5.4%
s 640
 
5.3%
n 635
 
5.3%
M 571
 
4.8%
l 522
 
4.4%
o 507
 
4.2%
Other values (50) 4405
36.8%
ValueCountFrequency (%)
1347
 
11.3%
r 956
 
8.0%
e 854
 
7.2%
a 816
 
6.9%
n 657
 
5.5%
i 654
 
5.5%
s 644
 
5.4%
M 563
 
4.7%
l 541
 
4.5%
o 510
 
4.3%
Other values (50) 4365
36.7%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
295 
female
151 
male
292 
female
154 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.677134.690583
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters20862092
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowmalefemale
2nd rowmalemale
3rd rowmalemale
4th rowmalemale
5th rowfemalemale

Common Values

ValueCountFrequency (%)
male 295
66.1%
female 151
33.9%
ValueCountFrequency (%)
male 292
65.5%
female 154
34.5%

Length

2023-06-21T12:53:55.589187image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-06-21T12:53:55.918216image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset B

2023-06-21T12:53:56.058465image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
male 295
66.1%
female 151
33.9%
ValueCountFrequency (%)
male 292
65.5%
female 154
34.5%

Most occurring characters

ValueCountFrequency (%)
e 597
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 151
 
7.2%
ValueCountFrequency (%)
e 600
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 154
 
7.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2086
100.0%
ValueCountFrequency (%)
Lowercase Letter 2092
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 597
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 151
 
7.2%
ValueCountFrequency (%)
e 600
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 154
 
7.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 2086
100.0%
ValueCountFrequency (%)
Latin 2092
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 597
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 151
 
7.2%
ValueCountFrequency (%)
e 600
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 154
 
7.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2086
100.0%
ValueCountFrequency (%)
ASCII 2092
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 597
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 151
 
7.2%
ValueCountFrequency (%)
e 600
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 154
 
7.4%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct6980
Distinct (%)19.9%22.7%
Missing9993
Missing (%)22.2%20.9%
Infinite00
Infinite (%)0.0%0.0%
Mean29.21181629.832861
 Dataset ADataset B
Minimum0.420.42
Maximum6480
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-06-21T12:53:56.248451image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.420.42
5-th percentile43.6
Q12020
median2828.5
Q338.539
95-th percentile53.758
Maximum6480
Range63.5879.58
Interquartile range (IQR)18.519

Descriptive statistics

 Dataset ADataset B
Standard deviation13.72164514.792112
Coefficient of variation (CV)0.469729290.49583283
Kurtosis-0.246566240.20666063
Mean29.21181629.832861
Median Absolute Deviation (MAD)99.5
Skewness0.127395590.36664262
Sum10136.510531
Variance188.28355218.80658
MonotonicityNot monotonicNot monotonic
2023-06-21T12:53:56.494900image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
25 14
 
3.1%
21 13
 
2.9%
19 13
 
2.9%
30 13
 
2.9%
22 12
 
2.7%
27 12
 
2.7%
24 11
 
2.5%
20 11
 
2.5%
29 10
 
2.2%
36 10
 
2.2%
Other values (59) 228
51.1%
(Missing) 99
22.2%
ValueCountFrequency (%)
30 16
 
3.6%
19 15
 
3.4%
28 14
 
3.1%
22 13
 
2.9%
18 13
 
2.9%
25 13
 
2.9%
32 13
 
2.9%
24 13
 
2.9%
21 11
 
2.5%
31 11
 
2.5%
Other values (70) 221
49.6%
(Missing) 93
20.9%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 1
 
0.2%
0.83 1
 
0.2%
1 4
0.9%
2 6
1.3%
3 2
 
0.4%
4 5
1.1%
5 2
 
0.4%
7 3
0.7%
8 2
 
0.4%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 2
 
0.4%
0.83 2
 
0.4%
0.92 1
 
0.2%
1 3
0.7%
2 6
1.3%
3 3
0.7%
4 5
1.1%
5 1
 
0.2%
7 1
 
0.2%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 2
 
0.4%
0.83 2
 
0.4%
0.92 1
 
0.2%
1 3
0.7%
2 6
1.3%
3 3
0.7%
4 5
1.1%
5 1
 
0.2%
7 1
 
0.2%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 1
 
0.2%
0.83 1
 
0.2%
1 4
0.9%
2 6
1.3%
3 2
 
0.4%
4 5
1.1%
5 2
 
0.4%
7 3
0.7%
8 2
 
0.4%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.544843050.48206278
 Dataset ADataset B
Minimum00
Maximum88
Zeros306312
Zeros (%)68.6%70.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-06-21T12:53:56.680348image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile22
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.24395451.04651
Coefficient of variation (CV)2.28314282.1708998
Kurtosis19.5393719.864923
Mean0.544843050.48206278
Median Absolute Deviation (MAD)00
Skewness4.05197723.8528422
Sum243215
Variance1.54742281.0951832
MonotonicityNot monotonicNot monotonic
2023-06-21T12:53:56.814135image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 306
68.6%
1 105
 
23.5%
2 13
 
2.9%
8 7
 
1.6%
4 7
 
1.6%
3 6
 
1.3%
5 2
 
0.4%
ValueCountFrequency (%)
0 312
70.0%
1 101
 
22.6%
2 12
 
2.7%
3 8
 
1.8%
4 8
 
1.8%
8 3
 
0.7%
5 2
 
0.4%
ValueCountFrequency (%)
0 306
68.6%
1 105
 
23.5%
2 13
 
2.9%
3 6
 
1.3%
4 7
 
1.6%
5 2
 
0.4%
8 7
 
1.6%
ValueCountFrequency (%)
0 312
70.0%
1 101
 
22.6%
2 12
 
2.7%
3 8
 
1.8%
4 8
 
1.8%
5 2
 
0.4%
8 3
 
0.7%
ValueCountFrequency (%)
0 312
70.0%
1 101
 
22.6%
2 12
 
2.7%
3 8
 
1.8%
4 8
 
1.8%
5 2
 
0.4%
8 3
 
0.7%
ValueCountFrequency (%)
0 306
68.6%
1 105
 
23.5%
2 13
 
2.9%
3 6
 
1.3%
4 7
 
1.6%
5 2
 
0.4%
8 7
 
1.6%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct66
Distinct (%)1.3%1.3%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.3677130.36547085
 Dataset ADataset B
Minimum00
Maximum55
Zeros346347
Zeros (%)77.6%77.8%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-06-21T12:53:56.952349image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q300
95-th percentile22
Maximum55
Range55
Interquartile range (IQR)00

Descriptive statistics

 Dataset ADataset B
Standard deviation0.815158450.81754394
Coefficient of variation (CV)2.21683332.2369607
Kurtosis10.24057710.291075
Mean0.3677130.36547085
Median Absolute Deviation (MAD)00
Skewness2.88231252.9046286
Sum164163
Variance0.66448330.66837809
MonotonicityNot monotonicNot monotonic
2023-06-21T12:53:57.087886image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 346
77.6%
1 54
 
12.1%
2 38
 
8.5%
5 4
 
0.9%
4 2
 
0.4%
3 2
 
0.4%
ValueCountFrequency (%)
0 347
77.8%
1 54
 
12.1%
2 36
 
8.1%
5 4
 
0.9%
3 3
 
0.7%
4 2
 
0.4%
ValueCountFrequency (%)
0 346
77.6%
1 54
 
12.1%
2 38
 
8.5%
3 2
 
0.4%
4 2
 
0.4%
5 4
 
0.9%
ValueCountFrequency (%)
0 347
77.8%
1 54
 
12.1%
2 36
 
8.1%
3 3
 
0.7%
4 2
 
0.4%
5 4
 
0.9%
ValueCountFrequency (%)
0 347
77.8%
1 54
 
12.1%
2 36
 
8.1%
3 3
 
0.7%
4 2
 
0.4%
5 4
 
0.9%
ValueCountFrequency (%)
0 346
77.6%
1 54
 
12.1%
2 38
 
8.5%
3 2
 
0.4%
4 2
 
0.4%
5 4
 
0.9%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct389388
Distinct (%)87.2%87.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-06-21T12:53:57.771703image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.69955166.7623318
Min length33

Characters and Unicode

 Dataset ADataset B
Total characters29883016
Distinct characters3532
Distinct categories55 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique344344 ?
Unique (%)77.1%77.1%

Sample

 Dataset ADataset B
1st rowC 7075240929
2nd rowC.A. 33112350060
3rd rowCA. 2343345774
4th rowC 1736928403
5th row7552SOTON/O2 3101287
ValueCountFrequency (%)
pc 29
 
5.2%
c.a 12
 
2.1%
ca 9
 
1.6%
a/5 7
 
1.2%
2343 7
 
1.2%
2 7
 
1.2%
ston/o 7
 
1.2%
382652 5
 
0.9%
soton/o.q 5
 
0.9%
w./c 4
 
0.7%
Other values (410) 470
83.6%
ValueCountFrequency (%)
pc 26
 
4.7%
c.a 9
 
1.6%
a/5 7
 
1.3%
2 7
 
1.3%
ston/o 7
 
1.3%
ca 6
 
1.1%
347082 6
 
1.1%
soton/o.q 5
 
0.9%
soton/oq 5
 
0.9%
ston/o2 5
 
0.9%
Other values (407) 476
85.2%
2023-06-21T12:53:58.691423image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 390
13.1%
1 337
11.3%
2 293
9.8%
7 257
8.6%
6 238
8.0%
4 210
 
7.0%
0 195
 
6.5%
5 194
 
6.5%
9 159
 
5.3%
8 129
 
4.3%
Other values (25) 586
19.6%
ValueCountFrequency (%)
3 390
12.9%
1 336
11.1%
2 296
9.8%
7 233
 
7.7%
4 230
 
7.6%
6 222
 
7.4%
0 208
 
6.9%
5 193
 
6.4%
9 165
 
5.5%
8 145
 
4.8%
Other values (22) 598
19.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2402
80.4%
Uppercase Letter 316
 
10.6%
Other Punctuation 142
 
4.8%
Space Separator 116
 
3.9%
Lowercase Letter 12
 
0.4%
ValueCountFrequency (%)
Decimal Number 2418
80.2%
Uppercase Letter 323
 
10.7%
Other Punctuation 153
 
5.1%
Space Separator 113
 
3.7%
Lowercase Letter 9
 
0.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 390
16.2%
1 337
14.0%
2 293
12.2%
7 257
10.7%
6 238
9.9%
4 210
8.7%
0 195
8.1%
5 194
8.1%
9 159
6.6%
8 129
 
5.4%
ValueCountFrequency (%)
3 390
16.1%
1 336
13.9%
2 296
12.2%
7 233
9.6%
4 230
9.5%
6 222
9.2%
0 208
8.6%
5 193
8.0%
9 165
6.8%
8 145
 
6.0%
Space Separator
ValueCountFrequency (%)
116
100.0%
ValueCountFrequency (%)
113
100.0%
Other Punctuation
ValueCountFrequency (%)
. 94
66.2%
/ 48
33.8%
ValueCountFrequency (%)
. 102
66.7%
/ 51
33.3%
Uppercase Letter
ValueCountFrequency (%)
C 75
23.7%
O 48
15.2%
P 42
13.3%
A 40
12.7%
S 37
11.7%
N 19
 
6.0%
T 17
 
5.4%
W 9
 
2.8%
Q 8
 
2.5%
I 5
 
1.6%
Other values (6) 16
 
5.1%
ValueCountFrequency (%)
O 65
20.1%
C 63
19.5%
P 42
13.0%
S 40
12.4%
A 33
10.2%
N 25
 
7.7%
T 23
 
7.1%
Q 10
 
3.1%
W 7
 
2.2%
F 5
 
1.5%
Other values (5) 10
 
3.1%
Lowercase Letter
ValueCountFrequency (%)
s 3
25.0%
a 3
25.0%
r 2
16.7%
i 2
16.7%
l 1
 
8.3%
e 1
 
8.3%
ValueCountFrequency (%)
a 3
33.3%
r 2
22.2%
i 2
22.2%
s 2
22.2%

Most occurring scripts

ValueCountFrequency (%)
Common 2660
89.0%
Latin 328
 
11.0%
ValueCountFrequency (%)
Common 2684
89.0%
Latin 332
 
11.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 390
14.7%
1 337
12.7%
2 293
11.0%
7 257
9.7%
6 238
8.9%
4 210
7.9%
0 195
7.3%
5 194
7.3%
9 159
6.0%
8 129
 
4.8%
Other values (3) 258
9.7%
ValueCountFrequency (%)
3 390
14.5%
1 336
12.5%
2 296
11.0%
7 233
8.7%
4 230
8.6%
6 222
8.3%
0 208
7.7%
5 193
7.2%
9 165
6.1%
8 145
 
5.4%
Other values (3) 266
9.9%
Latin
ValueCountFrequency (%)
C 75
22.9%
O 48
14.6%
P 42
12.8%
A 40
12.2%
S 37
11.3%
N 19
 
5.8%
T 17
 
5.2%
W 9
 
2.7%
Q 8
 
2.4%
I 5
 
1.5%
Other values (12) 28
 
8.5%
ValueCountFrequency (%)
O 65
19.6%
C 63
19.0%
P 42
12.7%
S 40
12.0%
A 33
9.9%
N 25
 
7.5%
T 23
 
6.9%
Q 10
 
3.0%
W 7
 
2.1%
F 5
 
1.5%
Other values (9) 19
 
5.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2988
100.0%
ValueCountFrequency (%)
ASCII 3016
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 390
13.1%
1 337
11.3%
2 293
9.8%
7 257
8.6%
6 238
8.0%
4 210
 
7.0%
0 195
 
6.5%
5 194
 
6.5%
9 159
 
5.3%
8 129
 
4.3%
Other values (25) 586
19.6%
ValueCountFrequency (%)
3 390
12.9%
1 336
11.1%
2 296
9.8%
7 233
 
7.7%
4 230
 
7.6%
6 222
 
7.4%
0 208
 
6.9%
5 193
 
6.4%
9 165
 
5.5%
8 145
 
4.8%
Other values (22) 598
19.8%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct188181
Distinct (%)42.2%40.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean31.98570628.221
 Dataset ADataset B
Minimum00
Maximum512.3292263
Zeros66
Zeros (%)1.3%1.3%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-06-21T12:53:58.946189image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.1625257.225
Q17.90317.8958
median13.8958513.68125
Q330.070830.0708
95-th percentile12090
Maximum512.3292263
Range512.3292263
Interquartile range (IQR)22.167722.175

Descriptive statistics

 Dataset ADataset B
Standard deviation50.41163434.576378
Coefficient of variation (CV)1.57606761.2252003
Kurtosis39.13223813.121447
Mean31.98570628.221
Median Absolute Deviation (MAD)6.645856.13125
Skewness5.19424573.0872007
Sum14265.62512586.566
Variance2541.33281195.5259
MonotonicityNot monotonicNot monotonic
2023-06-21T12:53:59.190683image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
13 21
 
4.7%
7.8958 21
 
4.7%
8.05 20
 
4.5%
26 17
 
3.8%
7.75 17
 
3.8%
10.5 11
 
2.5%
7.925 9
 
2.0%
26.55 7
 
1.6%
7.2292 7
 
1.6%
8.6625 7
 
1.6%
Other values (178) 309
69.3%
ValueCountFrequency (%)
8.05 28
 
6.3%
13 24
 
5.4%
7.75 23
 
5.2%
7.8958 22
 
4.9%
26 15
 
3.4%
7.925 10
 
2.2%
7.775 8
 
1.8%
7.2292 7
 
1.6%
10.5 7
 
1.6%
7.8542 7
 
1.6%
Other values (171) 295
66.1%
ValueCountFrequency (%)
0 6
1.3%
4.0125 1
 
0.2%
5 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%
ValueCountFrequency (%)
0 6
1.3%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 2
 
0.4%
7.05 4
0.9%
7.125 3
0.7%
ValueCountFrequency (%)
0 6
1.3%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 2
 
0.4%
7.05 4
0.9%
7.125 3
0.7%
ValueCountFrequency (%)
0 6
1.3%
4.0125 1
 
0.2%
5 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct8583
Distinct (%)88.5%87.4%
Missing350351
Missing (%)78.5%78.7%
Memory size7.0 KiB7.0 KiB
2023-06-21T12:53:59.783542image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1111
Median length33
Mean length3.47916673.3789474
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters334321
Distinct characters1818
Distinct categories33 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique7571 ?
Unique (%)78.1%74.7%

Sample

 Dataset ADataset B
1st rowD49A10
2nd rowE8E33
3rd rowB77D17
4th rowB5E49
5th rowC103C93
ValueCountFrequency (%)
b96 3
 
2.8%
b98 3
 
2.8%
e101 2
 
1.8%
e8 2
 
1.8%
d 2
 
1.8%
c124 2
 
1.8%
e44 2
 
1.8%
g6 2
 
1.8%
c26 2
 
1.8%
c22 2
 
1.8%
Other values (85) 87
79.8%
ValueCountFrequency (%)
c22 2
 
1.9%
b58 2
 
1.9%
c26 2
 
1.9%
c93 2
 
1.9%
b20 2
 
1.9%
e33 2
 
1.9%
e101 2
 
1.9%
b60 2
 
1.9%
e44 2
 
1.9%
c126 2
 
1.9%
Other values (79) 84
80.8%
2023-06-21T12:54:00.552197image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 33
 
9.9%
B 32
 
9.6%
C 31
 
9.3%
3 29
 
8.7%
2 25
 
7.5%
4 24
 
7.2%
D 21
 
6.3%
6 20
 
6.0%
9 19
 
5.7%
0 17
 
5.1%
Other values (8) 83
24.9%
ValueCountFrequency (%)
C 32
 
10.0%
2 32
 
10.0%
1 28
 
8.7%
3 24
 
7.5%
B 23
 
7.2%
6 22
 
6.9%
4 22
 
6.9%
8 20
 
6.2%
E 19
 
5.9%
5 18
 
5.6%
Other values (8) 81
25.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 212
63.5%
Uppercase Letter 109
32.6%
Space Separator 13
 
3.9%
ValueCountFrequency (%)
Decimal Number 208
64.8%
Uppercase Letter 104
32.4%
Space Separator 9
 
2.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 33
15.6%
3 29
13.7%
2 25
11.8%
4 24
11.3%
6 20
9.4%
9 19
9.0%
0 17
8.0%
8 16
7.5%
5 15
7.1%
7 14
6.6%
ValueCountFrequency (%)
2 32
15.4%
1 28
13.5%
3 24
11.5%
6 22
10.6%
4 22
10.6%
8 20
9.6%
5 18
8.7%
0 17
8.2%
9 13
6.2%
7 12
 
5.8%
Uppercase Letter
ValueCountFrequency (%)
B 32
29.4%
C 31
28.4%
D 21
19.3%
E 12
 
11.0%
A 6
 
5.5%
F 4
 
3.7%
G 3
 
2.8%
ValueCountFrequency (%)
C 32
30.8%
B 23
22.1%
E 19
18.3%
D 17
16.3%
A 8
 
7.7%
F 4
 
3.8%
G 1
 
1.0%
Space Separator
ValueCountFrequency (%)
13
100.0%
ValueCountFrequency (%)
9
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 225
67.4%
Latin 109
32.6%
ValueCountFrequency (%)
Common 217
67.6%
Latin 104
32.4%

Most frequent character per script

Common
ValueCountFrequency (%)
1 33
14.7%
3 29
12.9%
2 25
11.1%
4 24
10.7%
6 20
8.9%
9 19
8.4%
0 17
7.6%
8 16
7.1%
5 15
6.7%
7 14
6.2%
ValueCountFrequency (%)
2 32
14.7%
1 28
12.9%
3 24
11.1%
6 22
10.1%
4 22
10.1%
8 20
9.2%
5 18
8.3%
0 17
7.8%
9 13
6.0%
7 12
 
5.5%
Latin
ValueCountFrequency (%)
B 32
29.4%
C 31
28.4%
D 21
19.3%
E 12
 
11.0%
A 6
 
5.5%
F 4
 
3.7%
G 3
 
2.8%
ValueCountFrequency (%)
C 32
30.8%
B 23
22.1%
E 19
18.3%
D 17
16.3%
A 8
 
7.7%
F 4
 
3.8%
G 1
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 334
100.0%
ValueCountFrequency (%)
ASCII 321
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 33
 
9.9%
B 32
 
9.6%
C 31
 
9.3%
3 29
 
8.7%
2 25
 
7.5%
4 24
 
7.2%
D 21
 
6.3%
6 20
 
6.0%
9 19
 
5.7%
0 17
 
5.1%
Other values (8) 83
24.9%
ValueCountFrequency (%)
C 32
 
10.0%
2 32
 
10.0%
1 28
 
8.7%
3 24
 
7.5%
B 23
 
7.2%
6 22
 
6.9%
4 22
 
6.9%
8 20
 
6.2%
E 19
 
5.9%
5 18
 
5.6%
Other values (8) 81
25.2%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
S
317 
C
87 
Q
42 
S
324 
C
78 
Q
44 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowSS
2nd rowSS
3rd rowSS
4th rowSS
5th rowSS

Common Values

ValueCountFrequency (%)
S 317
71.1%
C 87
 
19.5%
Q 42
 
9.4%
ValueCountFrequency (%)
S 324
72.6%
C 78
 
17.5%
Q 44
 
9.9%

Length

2023-06-21T12:54:00.743850image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-06-21T12:54:00.897240image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset B

2023-06-21T12:54:01.046136image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
s 317
71.1%
c 87
 
19.5%
q 42
 
9.4%
ValueCountFrequency (%)
s 324
72.6%
c 78
 
17.5%
q 44
 
9.9%

Most occurring characters

ValueCountFrequency (%)
S 317
71.1%
C 87
 
19.5%
Q 42
 
9.4%
ValueCountFrequency (%)
S 324
72.6%
C 78
 
17.5%
Q 44
 
9.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 446
100.0%
ValueCountFrequency (%)
Uppercase Letter 446
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 317
71.1%
C 87
 
19.5%
Q 42
 
9.4%
ValueCountFrequency (%)
S 324
72.6%
C 78
 
17.5%
Q 44
 
9.9%

Most occurring scripts

ValueCountFrequency (%)
Latin 446
100.0%
ValueCountFrequency (%)
Latin 446
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 317
71.1%
C 87
 
19.5%
Q 42
 
9.4%
ValueCountFrequency (%)
S 324
72.6%
C 78
 
17.5%
Q 44
 
9.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 317
71.1%
C 87
 
19.5%
Q 42
 
9.4%
ValueCountFrequency (%)
S 324
72.6%
C 78
 
17.5%
Q 44
 
9.9%

Interactions

Dataset A

2023-06-21T12:53:47.053688image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset B

2023-06-21T12:53:51.288255image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset A

2023-06-21T12:53:44.332023image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset B

2023-06-21T12:53:48.619009image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset A

2023-06-21T12:53:44.970428image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset B

2023-06-21T12:53:49.265964image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset A

2023-06-21T12:53:45.604640image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset B

2023-06-21T12:53:49.945380image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset A

2023-06-21T12:53:46.290028image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset B

2023-06-21T12:53:50.634452image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset A

2023-06-21T12:53:47.174509image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset B

2023-06-21T12:53:51.554136image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset A

2023-06-21T12:53:44.454192image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset B

2023-06-21T12:53:48.738115image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset A

2023-06-21T12:53:45.093368image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset B

2023-06-21T12:53:49.393544image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset A

2023-06-21T12:53:45.736674image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset B

2023-06-21T12:53:50.075960image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset A

2023-06-21T12:53:46.537015image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset B

2023-06-21T12:53:50.756116image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset A

2023-06-21T12:53:47.303571image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset B

2023-06-21T12:53:51.693335image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset A

2023-06-21T12:53:44.582099image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset B

2023-06-21T12:53:48.875493image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset A

2023-06-21T12:53:45.222656image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset B

2023-06-21T12:53:49.541179image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset A

2023-06-21T12:53:45.866037image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset B

2023-06-21T12:53:50.214845image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset A

2023-06-21T12:53:46.663881image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset B

2023-06-21T12:53:50.895880image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset A

2023-06-21T12:53:47.443563image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset B

2023-06-21T12:53:51.834857image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset A

2023-06-21T12:53:44.721547image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset B

2023-06-21T12:53:49.015605image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset A

2023-06-21T12:53:45.349905image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset B

2023-06-21T12:53:49.674384image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset A

2023-06-21T12:53:46.012693image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset B

2023-06-21T12:53:50.364887image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset A

2023-06-21T12:53:46.803531image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset B

2023-06-21T12:53:51.035020image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset A

2023-06-21T12:53:47.571745image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset B

2023-06-21T12:53:51.961586image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset A

2023-06-21T12:53:44.846487image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset B

2023-06-21T12:53:49.139248image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset A

2023-06-21T12:53:45.475909image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset B

2023-06-21T12:53:49.811444image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset A

2023-06-21T12:53:46.151540image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset B

2023-06-21T12:53:50.499069image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset A

2023-06-21T12:53:46.926419image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset B

2023-06-21T12:53:51.160361image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Correlations

Dataset A

2023-06-21T12:54:01.167794image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset B

2023-06-21T12:54:01.506735image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Dataset A

PassengerIdAgeSibSpParchFareSurvivedPclassSexEmbarked
PassengerId1.0000.016-0.0400.0050.0120.0000.0000.1280.000
Age0.0161.000-0.142-0.2190.2180.0840.3280.0000.000
SibSp-0.040-0.1421.0000.3770.4610.2070.1540.1720.152
Parch0.005-0.2190.3771.0000.4010.1380.0590.2370.058
Fare0.0120.2180.4610.4011.0000.2670.4800.2110.171
Survived0.0000.0840.2070.1380.2671.0000.3190.5250.192
Pclass0.0000.3280.1540.0590.4800.3191.0000.1170.250
Sex0.1280.0000.1720.2370.2110.5250.1171.0000.115
Embarked0.0000.0000.1520.0580.1710.1920.2500.1151.000

Dataset B

PassengerIdAgeSibSpParchFareSurvivedPclassSexEmbarked
PassengerId1.0000.013-0.038-0.014-0.0080.1050.0000.0000.000
Age0.0131.000-0.137-0.2410.1440.1630.2450.1240.114
SibSp-0.038-0.1371.0000.4160.4890.1840.1590.2180.076
Parch-0.014-0.2410.4161.0000.4040.1790.0520.2280.081
Fare-0.0080.1440.4890.4041.0000.2350.5480.1240.240
Survived0.1050.1630.1840.1790.2351.0000.3380.5010.120
Pclass0.0000.2450.1590.0520.5480.3381.0000.1140.255
Sex0.0000.1240.2180.2280.1240.5010.1141.0000.096
Embarked0.0000.1140.0760.0810.2400.1200.2550.0961.000

Missing values

Dataset A

2023-06-21T12:53:47.759129image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2023-06-21T12:53:52.149922image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2023-06-21T12:53:48.008586image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2023-06-21T12:53:52.400181image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2023-06-21T12:53:48.177231image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2023-06-21T12:53:52.565075image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
81881903Holm, Mr. John Fredrik Alexandermale43.000C 70756.4500NaNS
54955012Davies, Master. John Morgan Jrmale8.011C.A. 3311236.7500NaNS
15916003Sage, Master. Thomas HenrymaleNaN82CA. 234369.5500NaNS
12712813Madsen, Mr. Fridtjof Arnemale24.000C 173697.1417NaNS
88288303Dahlberg, Miss. Gerda Ulrikafemale22.000755210.5167NaNS
72272302Gillespie, Mr. William Henrymale34.0001223313.0000NaNS
79779813Osman, Mrs. Marafemale31.0003492448.6833NaNS
141503Vestrom, Miss. Hulda Amanda Adolfinafemale14.0003504067.8542NaNS
60360403Torber, Mr. Ernst Williammale44.0003645118.0500NaNS
51751803Ryan, Mr. PatrickmaleNaN0037111024.1500NaNQ

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
39940012Trout, Mrs. William H (Jessie L)female28.00024092912.6500NaNS
47847903Karlsson, Mr. Nils Augustmale22.0003500607.5208NaNS
28628713de Mulder, Mr. Theodoremale30.0003457749.5000NaNS
39739802McKane, Mr. Peter Davidmale46.0002840326.0000NaNS
84084103Alhomaki, Mr. Ilmari Rudolfmale20.000SOTON/O2 31012877.9250NaNS
40640703Widegren, Mr. Carl/Charles Petermale51.0003470647.7500NaNS
58358401Ross, Mr. John Hugomale36.0001304940.1250A10C
35635711Bowerman, Miss. Elsie Edithfemale22.00111350555.0000E33S
86286311Swift, Mrs. Frederick Joel (Margaret Welles Barron)female48.0001746625.9292D17S
32432503Sage, Mr. George John JrmaleNaN82CA. 234369.5500NaNS

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
38238303Tikkanen, Mr. Juhomale32.000STON/O 2. 31012937.9250NaNS
313211Spencer, Mrs. William Augustus (Marie Eugenie)femaleNaN10PC 17569146.5208B78C
45345411Goldenberg, Mr. Samuel Lmale49.0101745389.1042C92C
46746801Smart, Mr. John Montgomerymale56.00011379226.5500NaNS
25125203Strom, Mrs. Wilhelm (Elna Matilda Persson)female29.01134705410.4625G6S
19519611Lurette, Miss. Elisefemale58.000PC 17569146.5208B80C
15015102Bateman, Rev. Robert Jamesmale51.000S.O.P. 116612.5250NaNS
27627703Lindblom, Miss. Augusta Charlottafemale45.0003470737.7500NaNS
37537611Meyer, Mrs. Edgar Joseph (Leila Saks)femaleNaN10PC 1760482.1708NaNC
47948013Hirvonen, Miss. Hildur Efemale2.001310129812.2875NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
85785811Daly, Mr. Peter Denismale51.00011305526.5500E17S
303101Uruchurtu, Don. Manuel Emale40.000PC 1760127.7208NaNC
404103Ahlin, Mrs. Johan (Johanna Persdotter Larsson)female40.01075469.4750NaNS
37037111Harder, Mr. George Achillesmale25.0101176555.4417E50C
66066111Frauenthal, Dr. Henry Williammale50.020PC 17611133.6500NaNS
68168211Hassab, Mr. Hammadmale27.000PC 1757276.7292D49C
23723812Collyer, Miss. Marjorie "Lottie"female8.002C.A. 3192126.2500NaNS
41841902Matthews, Mr. William Johnmale30.0002822813.0000NaNS
39039111Carter, Mr. William Ernestmale36.012113760120.0000B96 B98S
63964003Thorneycroft, Mr. PercivalmaleNaN1037656416.1000NaNS

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.